index.html Vis I made for Max, based on her notes for CYCLE_J:
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                     CYCLE_J v0.1 │
│                                                                            Midwestern Simulation │
│                                                                                                  │
├──────────────────────────────────────────────────────────────────────────────────────────────────┤
│ ┌─────┐                                                                                          │
│ │GPT-J│                              \    o         . 0       IMPLEMENTATION DETAILS             │
│ └┬────┘  ┌──                       \,'`. .        \  o        ─────────────────────────────────  │
│  ├───────┤ expand ctx to 4k        /,.'`        \,'`.                                            │
│  ▼       └──                         /          /,.'`         DEFAULT HYPERPARAMETERS            │
│┌──────────────┐                                   /                                              │
││INTERMEDIATE-1├───────────────────────────────────────────┐   BS  : 32         ┌─LORA:───────┐   │
│└─┬────────────┘                                           │   LR  : 1x10^-4    │rank : 8     │   │
│  │   ┌──────                                              │   WARM: 10%        │alpha: sqrt8 │   │
│  │   │ maximize p(txt_paraphrased|"{DA}{txt}{DB}")        │   MAX              └─────────────┘   │
│  │   │        + p(txt|"{DB}{txt_paraphrased}{DA}")        │   GRAD: 1.0          >O     >o       │
│  │   │   where:                                           │   CTX : 4096             >o          │
│  ├───┤         DA, DB denote markers for domain A,B       │   OPT : AdamW                        │
│  │   │         txt_paraphrased is a paraphrased txt by    │         betas: 0.9,0.95  eps 1e-6    │
│  │   │            Mistral Instruct 7b 0.1 or Qwen2.5      │   SCHD: Cosine w/ Linear Warmup      │
│  │   │            instruct                                │                                      │
│  ▼   └──────                                              │                                      │
│┌───────────────┐  ┌───────────────┐                       │   HYPERPARAMETER DIFFERENCES         │
││INTERMEDIATE-2A│┌─┤INTERMEDIATE-2B│───────────┬───────────┘                                      │
│└─┬─────────────┘│ └───────────────┘           │               INTERMEDIATE-1 : N/A (no change)   │
│  │              │ ┌───────────────────────────┴───────────┐   INTERMEDIATE-2a: N/A (no change)   │
│  │    merge     │ │ maximize p(txt_A|DA)                  │   INTERMEDIATE-2b: N/A (no change)   │
│  └──► models ◄──┘ │        + p(txt_B|DB)                  │   INTERMEDIATE-3 : N/A (no training) │
│      linearly     │   where:                              │   CYCLE-J        : NO LORA, BS 8,    │
│         │                   DA, DB markers for domain A,B                      ROLLOUTS 16,      │
│  ┌──────┘     ,                txt_A, txt_B texts sampled                      LR 1x10^-5,       │
│  │            O<     ,         from domains A, B                               MAX GRAD 0.0001   │
│  ▼                ,  o<                                                                          │
│┌──────────────┐   0<                                                                             │
││INTERMEDIATE-3│                 O     O                       I've noticed a few different       │
│└─┬────────────┘                   o o         /\              problems that I have that I could  │
│  │   ┌──────                        o       _/./              very easily solve with a CYCLEGAN  │
│  │   │  maximize rewards:             o  ,-'    `-:..-'/      for text, so that's what CYCLE_J   │
│  │   │    1. cycle consistency          : o )      _  (       aims to be. I used GPT-J because   │
│  │   │       a. embedding similarity*   "`-....,--; `-.\      it's "low-background;" no chat     │
│  ├───┤       b. rouge, bleu                   `'              model interactions in its dataset  │
│  │   │    2. discriminator*               *based on NeoBERT   means no trying to circumvent      │
│  │   │       trained on real+gen samples                      refusal behavior in sensitive      │
│  │   │          during RL                                     domains, as well as a less         │
│  │   └──────                                                  filtered output quality, which     │
│  ▼                                                            is important when working with     │
│ ┌───────┐ A model trained to translate between unpaired       real (unfiltered) data in prod.    │
│ │CYCLE-J│ domains, using model merging and policy gradients   If you don't model it, why expect  │
│ └───────┘                                                     good results when trying to        │
│                                                               transmogrify it with your model?   │
│                                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘